19 research outputs found

    Accelerating Verified-Compiler Development with a Verified Rewriting Engine

    Get PDF
    Compilers are a prime target for formal verification, since compiler bugs invalidate higher-level correctness guarantees, but compiler changes may become more labor-intensive to implement, if they must come with proof patches. One appealing approach is to present compilers as sets of algebraic rewrite rules, which a generic engine can apply efficiently. Now each rewrite rule can be proved separately, with no need to revisit past proofs for other parts of the compiler. We present the first realization of this idea, in the form of a framework for the Coq proof assistant. Our new Coq command takes normal proved theorems and combines them automatically into fast compilers with proofs. We applied our framework to improve the Fiat Cryptography toolchain for generating cryptographic arithmetic, producing an extracted command-line compiler that is about 1000Ă—\times faster while actually featuring simpler compiler-specific proofs.Comment: 13th International Conference on Interactive Theorem Proving (ITP 2022

    Omnisemantics: Smooth Handling of Nondeterminism

    Get PDF
    This paper gives an in-depth presentation of the omni-big-step and omni-small-step styles of semantic judgments. These styles describe operational semantics by relating starting states to sets of outcomes rather than to individual outcomes. A single derivation of these semantics for a particular starting state and program describes all possible nondeterministic executions (hence the name "omni"), whereas in traditional small-step and big-step semantics, each derivation only talks about one single execution. This restructuring allows for straightforward modeling of languages featuring both nondeterminism and undefined behavior. Specifically, omnisemantics inherently assert safety, i.e. they guarantee that none of the execution branches gets stuck, while traditional semantics need either a separate judgment or additional error markers to specify safety in the presence of nondeterminism.Omnisemantics can be understood as an inductively defined weakest-precondition semantics (or more generally, predicate-transformer semantics) that does not involve invariants for loops and recursion, but instead uses unrolling rules like in traditional small-step and big-step semantics. Omnisemantics have already been used in the past, but we believe that it has been under-appreciated and that it deserves a well-motivated, extensive and pedagogical presentation of its benefits. We also explore several novel aspects associated with these semantics, in particular their use in type-soundness proofs for lambda calculi, partial-correctness reasoning, and forward proofs of compiler correctness for terminating but potentially nondeterministic programs being compiled to nondeterministic target languages. All results in this paper are formalized in Coq

    CryptOpt: Verified Compilation with Random Program Search for Cryptographic Primitives

    Full text link
    Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language. However, cryptography has been an exception, where many performance-critical routines have been written directly in assembly (sometimes through metaprogramming layers). Some past work has shown how to do formal verification of that assembly, and other work has shown how to generate C code automatically along with formal proof, but with consequent performance penalties vs. the best-known assembly. We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanized proof (in Coq) whose final theorem statement mentions little beyond the input functional program and the operational semantics of x86-64 assembly. On the optimization side, we apply randomized search through the space of assembly programs, with repeated automatic benchmarking on target CPUs. On the formal-verification side, we connect to the Fiat Cryptography framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker, incorporating a modest subset of known features of SMT solvers and symbolic-execution engines. The overall prototype is quite practical, e.g. producing new fastest-known implementations for the relatively new Intel i9 12G, of finite-field arithmetic for both Curve25519 (part of the TLS standard) and the Bitcoin elliptic curve secp256k1

    Set It and Forget It! Turnkey ECC for Instant Integration

    Get PDF
    Historically, Elliptic Curve Cryptography (ECC) is an active field of applied cryptography where recent focus is on high speed, constant time, and formally verified implementations. While there are a handful of outliers where all these concepts join and land in real-world deployments, these are generally on a case-by-case basis: e.g.\ a library may feature such X25519 or P-256 code, but not for all curves. In this work, we propose and implement a methodology that fully automates the implementation, testing, and integration of ECC stacks with the above properties. We demonstrate the flexibility and applicability of our methodology by seamlessly integrating into three real-world projects: OpenSSL, Mozilla's NSS, and the GOST OpenSSL Engine, achieving roughly 9.5x, 4.5x, 13.3x, and 3.7x speedup on any given curve for key generation, key agreement, signing, and verifying, respectively. Furthermore, we showcase the efficacy of our testing methodology by uncovering flaws and vulnerabilities in OpenSSL, and a specification-level vulnerability in a Russian standard. Our work bridges the gap between significant applied cryptography research results and deployed software, fully automating the process

    Crafting certified elliptic curve cryptography implementations in Coq

    No full text
    Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 103-106).Elliptic curve cryptography has become a de-facto standard for protecting the privacy and integrity of internet communications. To minimize the operational cost and enable near-universal adoption, increasingly sophisticated implementation techniques have been developed. While the complete specification of an elliptic curve cryptosystem (in terms of middle school mathematics) fits on the back of a napkin, the fast implementations span thousands of lines of low-level code and are only intelligible to a small group of experts. However, the complexity of the code makes it prone to bugs, which have rendered well-designed security systems completely ineffective. I describe a principled approach for writing crypto code simultaneously with machine-checkable functional correctness proofs that compose into an end-to-end certificate tying highly optimized C code to the simplest specification used for verification so far. Despite using template-based synthesis for creating low-level code, this workflow offers good control over performance: I was able to match the fastest C implementation of X25519 to within 1% of arithmetic instructions per inner loop and 7% of overall execution time. While the development method itself relies heavily on a proof assistant such as Coq and most techniques are explained through code snippets, every Coq feature is introduced and motivated when it is first used to accommodate a non-Coq-savvy reader.by Andres Erbsen.M. Eng

    Certifying Derivation of State Machines from Coroutines

    No full text
    Artifact associated with POPL22 submission "Certifying Derivation of State Machines from Coroutines". It is currently under review following https://popl22.sigplan.org/track/POPL-2022-artifact-evaluation and should be read in combination with the submitted paper. # Claims 0. Motivating claims about state of practice in network-protocol state-machine implementation are not supported by the artifact, with the exception that we include a copy of Mozilla NSS and ocaml-tls in the home directory of the virtual machine. These implementations are discussed in the introduction of our paper. 1. Certified compilation: Section 3.3 page 13 line 591 states "Our system automatically proves equivalence between a source program and its compiled version. Moreover, the proof is constructed as we compile a program". Section 3.4 presents a concrete example. 2. Bisimulation: Section 3.3 page 12 line 573 states "These rules are essentially a simplification of the classic technique of bisimulation for our setting. In fact, we proved equivalence with the more standard definition of bisimulation, after expressing our source and target semantics in labeled-transition-system style". The corresponding code is in `~/coroutines/src/ClConv.v` theorem `equiv_is_bisimulate`. 3. Single-client performance: Section 4.3 line 948 states "with one client thread [..] our derived implementation comes within 50% of the performance of either of the more established alternatives." 4. Performance in comparison to concurrent implementations: Section 4.3 line 960 states "Unsurprisingly, the other implementations with their multicore execution perform several times better than we do, though again it seems we are within the window where an especially paranoid user might prefer our proved server under moderate load." The corresponding Fig. 9. shows Warp handling 5x more requests per second than our server, and with 3.5x better latency. 5. Section 4 line 697 claims that our TLS library implements a "just a large enough subset of TLS that we can test with standard Web browsers". Claims 1, 2, and 5 receive their own sections of this README, in that order. Instructions for performance benchmarking are combined in one section. The Coq development is available at https://github.com/mit-plv/certifying-derivation-of-state-machines-from-coroutines under the MIT license. # Setup instructions We provide a 16GB disk image in .vdi format that boots into a Linux terminal environment accessible over SSH (or graphically, if you so wish). You can get VirtualBox from your operating system's repositories or virtualbox.org. Here are instructions for using VirtualBox from the command line: ```sh cd . # navigate to the directory containing coroutines.vdi and coroutines.vbox VBoxManage registervm "(realpath coroutines.vbox)" VBoxManage startvm coroutines ``` Wait until a login screen appears and then hide it. You should then be able to connect to the virtual machine using `ssh -p 10022 [email protected]`. Alternatively, you can use the graphical interface. We provide the three commonly used Coq frontends: - `vim ~/coroutines/src/ClConv.v` should load CoqTail automatically when a relevant keybinding is used; `\ c l` evaluates to the cursor. - `emacs ~/coroutines/src/ClConv.v` should load ProofGeneral automatically when a `.v` file is opened; `Ctrl+Enter` evaluates to the cursor. - `coqide ~/coroutines/src/ClConv.v` should be usable from the graphical interface, `Ctrl + RightArrow` evaluates to the cursor. Please use your preferred Coq frontend to step through the first lemma in `~/coroutines/src/ClConv.v` to ensure that it is working properly. Further, please also follow the section "interoperability testing" below to ensure that the extracted Haskell code works as expected. # Evaluation instructions ## Certified compilation Attention: compiling TLS.v requires 3 hours of compute time and 37GB of RAM, which is above the limits of the submitted VirtualBox configuration. If you wish to verify it, you can increase the RAM limit under Virtualbox -> Machine -> Settings -> System -> Motherboard. You do not need to compile the file to evaluate the artifact; the build outputs are included. `~/coroutines/src/ClConv.v` around line 1701 contains lemma `ex_coroutine2_derive`, which shows how our proof-producing compiler is used. The key line is `derive_coro tt` -- it invokes the compiler to prove the goal set up by the previous setup steps. Executing `Print Assumptions ex_coroutine2_derive` after the `Defined` should produce `Closed under global contex`. The meaning of the predicate `equiv` is confirmed in the next subsection. ## Bisimulation 1. `~/coroutines/src/ClConv.v` around line 85 should contain `Record Bisimulation`; the predicate defined there should be recognizable as the standard definition of bisimulation for labeled transition systems. 2. The previous inductive definitions in the same section package high-level code (coroutines encoded using free monads) and low-level code (dependently typed state machines) as labeled transition systems; these appear in the statement of the bisimulation theorem. 3. Finally, theorem `equiv_is_bisimulate` around line 266 states that our compiler's `equiv` is equivalent to the standard notion of bisimulation. 4. Please evaluate to the end of the section (the line after after `End Effect.`) and execute `Print Assumptions equiv_is_bisimulate`. The output should confirm that only standard axioms `functional_extensionality_dep` and `eq_rect_eq` were used. ## Performance Experiments The scripts `~/coroutines/server/bench.sh`, `~/coroutines/warpserver/bench.sh`, and `~/coroutines/nginxserver/bench.sh` output performance results to files `/home/artifact/coroutines/warpserver/bench-warp-c1.txt`, `/home/artifact/coroutines/warpserver/bench-warp-c40.txt`, `/home/artifact/coroutines/nginxserver/bench-nginx-c40.txt`, `/home/artifact/coroutines/nginxserver/bench-nginx-c1.txt`, `/home/artifact/coroutines/server/bench-coroutines-c1.txt`, and `/home/artifact/coroutines/server/bench-coroutines-c40.txt`. Each script should take about a minute to run; they cannot be run concurrently because the servers listen on the same port.We encourage the artifact evaluators to confirm that these scripts launch the wrk benchmark tool against the corresponding server implementations found in `/home/artifact/coroutines/server/app/Main.hs`, `/home/artifact/coroutines/warpserver/main.hs`, and `/etc/nginx/nginx.conf`. Further, please check that the TLS server file matches the one that was extracted from coq: `diff -u ~/coroutines/TLS.hs ~/coroutines/server/src/TLS.hs.` (The latter has more imports and a slightly less eager type-error handler, but no code changes.) ### Observed Performance in the VM We observe that nginx in the VM runs faster than in our presubmission testing, whereas warp and our implementation exhibit absolute performance similar to that reported in our paper. We also observed tens of percent of variance between the two VM hosts we tried. Here are the numbers from an ultraportable laptop with an Intel Broadwell i7 processor, in the order the bars are presented in Fig 9. ``` artifact@artifact:~ find ~/coroutines -name '*-c40.txt' | xargs grep Latency | cut -d' ' -f-9 /home/artifact/coroutines/server/bench-coroutines-c1.txt: Latency 347.78us /home/artifact/coroutines/warpserver/bench-warp-c1.txt: Latency 128.59us /home/artifact/coroutines/nginxserver/bench-nginx-c1.txt: Latency 73.54us artifact@artifact:~ find ~/coroutines -name '*-c40.txt' | xargs grep Latency | cut -d' ' -f-10 /home/artifact/coroutines/server/bench-coroutines-c40.txt: Latency 7.89ms /home/artifact/coroutines/warpserver/bench-warp-c40.txt: Latency 2.23ms /home/artifact/coroutines/nginxserver/bench-nginx-c40.txt: Latency 1.13ms artifact@artifact:~ find ~/coroutines -name '*-c1.txt' | xargs grep Requests/ # SINGLE-THREADED THROUGHPUT -- THIS IS THE MOST IMPORTANT BENCHMARK /home/artifact/coroutines/server/bench-coroutines-c1.txt:Requests/sec: 5410.86 /home/artifact/coroutines/warpserver/bench-warp-c1.txt:Requests/sec: 9895.07 /home/artifact/coroutines/nginxserver/bench-nginx-c1.txt:Requests/sec: 13347.46 artifact@artifact:~$ find ~/coroutines -name '*-c40.txt' | xargs grep Requests/ /home/artifact/coroutines/server/bench-coroutines-c40.txt:Requests/sec: 5156.72 /home/artifact/coroutines/warpserver/bench-warp-c40.txt:Requests/sec: 16339.43 /home/artifact/coroutines/nginxserver/bench-nginx-c40.txt:Requests/sec: 34450.72 ``` ## Interoperability testing First, start our demo HTTPS server: `( cd ~/coroutines/server && /usr/bin/time stack run server.crt server.pem )` Then load it using headless Chrome: `/opt/google/chrome/chrome --disable-gpu --headless --dump-dom https://localhost:4433` The printed line should contain "Hello!" along with some HTML tags. Alternatively, open the Chrome web browser from the graphical interface of the VM and navigate to https://localhost:4433/, which should display "Hello!". Or using curl: `curl --tlsv1.3 --cacert ~/coroutines/server/server.crt https://localhost:4433/` The port 4433 is also forwarded to the VM host in the VirtualBox configuration so you can test it with your own browser. However, please note that the server uses a self-signed certificate for "localhost", so you'd likely need to click through several security warnings about that. Certificate-related warnings or errors are expected in such usage and should not be taken as a negative indication about the quality of the server implementation. Here are some errors you might encounter when running the server. - `server-exe: Network.Socket.bind: resource busy (Address already in use)` -- a server is already running on port 4433, try killing nginx, hs-exe or server-exe. - `server-exe: thread blocked indefinitely in an MVar operation` -- this is an implementation bug we did not anticipate during development. - "server-exe: threadWait: invalid argument (Bad file descriptor)" error and "server-exe: Network.Socket.recvBuf: resource vanished (Connection reset by peer)" -- we believe these refer to situations were the client connection was closed in the middle of a request. # Artifact contents The main Coq development is located in `~/coroutines/src/`, and the Haskell wrappers are in `~/coroutines/server`, including our copy of hs-tls with some internal APIs exposed at `~/coroutines/server/tls-1.5.3/`. The compiler is located in `~/coroutines/src/ClConv.v`, and the TLS case study is in `~/coroutines/src/TLS.v` culminating in `main_loop_derive`. Again, that file takes 3h and 37GB of RAM to process. Definition `doHandshake` is probably the most instructive to read to understand how TLS is implemented in our library; definition `readWrite` is the record-layer wrapper. `Parameter` directives in `TLS.v` are filled in with appropriate pure functions from `hs-tls` using `Extraction` directives at the end of the file. The main entry point to the compiler is `Ltac derive_coro`. Coq is installed through OPAM, and Haskell is installed through Stack. Recent versions of curl and wrk are located in `~/.local/bin/`

    Fast Setup for Proof by Reflection, in Two Lines of Ltac

    No full text
    © 2018, Springer International Publishing AG, part of Springer Nature. We present a new strategy for performing reification in Coq. That is, we show how to generate first-class abstract syntax trees from “native” terms of Coq’s logic, suitable as inputs to verified compilers or procedures in the proof-by-reflection style. Our new strategy, based on simple generalization of subterms as variables, is straightforward, short, and fast. In its pure form, it is only complete for constants and function applications, but “let” binders, eliminators, lambdas, and quantifiers can be accommodated through lightweight coding conventions or preprocessing. We survey the existing methods of reification across multiple Coq metaprogramming facilities, describing various design choices and tricks that can be used to speed them up, as well as various limitations. We report benchmarking results for 18 variants, in addition to our own, finding that our own reification outperforms 16 of these methods in all cases, and one additional method in some cases; writing an OCaml plugin is the only method tested to be faster. Our method is the most concise of the strategies we considered, reifying terms using only two to four lines of Ltac—beyond lists of the identifiers to reify and their reified variants. Additionally, our strategy automatically provides error messages that are no less helpful than Coq’s own error messages

    Omnisemantics: Smooth Handling of Nondeterminism

    Get PDF
    This paper gives an in-depth presentation of the omni-big-step and omni-small-step styles of semantic judgments. These styles describe operational semantics by relating starting states to sets of outcomes rather than to individual outcomes. A single derivation of these semantics for a particular starting state and program describes all possible nondeterministic executions (hence the name "omni"), whereas in traditional small-step and big-step semantics, each derivation only talks about one single execution. This restructuring allows for straightforward modeling of languages featuring both nondeterminism and undefined behavior. Specifically, omnisemantics inherently assert safety, i.e. they guarantee that none of the execution branches gets stuck, while traditional semantics need either a separate judgment or additional error markers to specify safety in the presence of nondeterminism.Omnisemantics can be understood as an inductively defined weakest-precondition semantics (or more generally, predicate-transformer semantics) that does not involve invariants for loops and recursion, but instead uses unrolling rules like in traditional small-step and big-step semantics. Omnisemantics have already been used in the past, but we believe that it has been under-appreciated and that it deserves a well-motivated, extensive and pedagogical presentation of its benefits. We also explore several novel aspects associated with these semantics, in particular their use in type-soundness proofs for lambda calculi, partial-correctness reasoning, and forward proofs of compiler correctness for terminating but potentially nondeterministic programs being compiled to nondeterministic target languages. All results in this paper are formalized in Coq
    corecore